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Method for Discrete Gate Sizing in a Netlist 

Background of the Invention 

5 The present invention relates generally to gate sizing in integrated circuit design 

and more particularly to a novel apparatus and method for network-based gate sizing in 
standard cell design. 

In a MOS integrated circuit, one parameter relating to the ability of a driver 
10 transistor to charge or discharge a load, C L9 is the channel width of the driver transistor 
which determines its output resistance, R , and hence the RC time constant upon which 
the switching speed depends. For a constant load, C L , an increase in the channel width 
of the driver transistor decreases its output resistance, R , thereby increasing switching 
speed. Conversely, for the same toad, C L , a decrease the channel width of the driver 
15 transistor increases its output resistance, R , thereby decreasing switching speed. 

When performing a timing analysis of the integrated circuit, a faster switching 
speed at this driver transistor may be required to maintain timing constraints within the 
MOS circuit. One solution would be to simply to increase the channel width of this 
20 transistor, for the reasons above stated. However, this transistor may also be a load of a 
previous transistor in the circuit. Since increasing the channel width of a MOS transistor 
increases its input gate capacitance, the load seen by the previous transistor increases, 
thereby resulting in slower switching at the previous stage. Accordingly, timing 
constraints may not then be met at the previous stage. 

25 

In the design of the data paths in a reasonably sized MOS integrated circuit, the 
smallest component design is generally a logic stage or standard cell, hereinafter referred 
to as a gate. Each gate is composed of various circuit components to implement its 
predefined function. The load, C L , referred to above thus is typically the sum of each 

30 input gate capacitance, C in , seen at an input pin of the gate, when being a receiver gate 
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switched by the driver transistor in the example above. Such gate, of course, also has an 
output pin at which the function and size of the driver transistor in the above example is 
found. 

5 Typically, each gate used in the design has previously been implemented in 

library, such that the components within the gate are not subject to further design 
variations outside of the library implementatioa Accordingly, selection of a gate from a 
library for a required size of its output transistor determines its corresponding input gate 
capacitance, and vice versa. Typically to provide design flexibility, for each gate in one 

10 logical family several variations of the gate are available from the library. Each of these 
variations for one particular gate is referred to as the gate size. Accordingly, timing 
along data paths and maintaining the requisite timing constraints becomes a problem of 
selecting gate sizes for each gate in the circuit. 

1C TT^v* «** I7i/» 1 /Dt^/vf A*4\ iUnm ic oKnum o oi'mnlo \/fOC /Miv»iiit 1ft \x4ttr»h 
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may be a portion of a much larger circuit. The circuit 10 includes a plurality of gates 
12i-7, for which there are known library implementations and for each one of the gates 
12i-7 several sizes are available. For purposes of this exemplary circuit 10, it shall be 
assumed that for each available size for each one of the gates \2ui channel width 
20 dependencies between transistors within each one of the gates 12^7 require that all such 
channel widths remain proportionally dependent within each one of the gates 12i.7 as it is 
upsized or downsized. Accordingly, a larger or smaller gate size for any one of the gates 
12i-7 respectively results in a larger or smaller input capacitance and in a smaller or larger 
output resistance. 

25 

Should the results of a timing analysis indicate that timing constraints are not met 
between a driver gate, such as gate 12\ and its receiver gates, such as gate 122 and gate 
123, fester switching between the driver gate and each receiver gate would need to occur. 
Accordingly, either the size of gate \2\, as the driver gate, would need to be made larger, 
30 or the size of each of gate 12 2 and gate 12 3 , as the receiver gates, would need to be 
smaller. For reasons as stated above, increasing the size of gate 12i decreases its output 
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resistance allowing faster switching and decreasing the size of gate 122 and gate 12} 
decreases the size of their respective input capacitance, also allowing fester switching. It 
may also be required that gate 12i is made larger simultaneously with gate 122 and gate 
123 being made smaller. 

5 

If the size of gate 12i is made larger, then its input capacitance, C h , is also made 
larger to due the increased channel widths in this gate. Accordingly, when gate 12i is a 
receiver gate for either the one of a previous driver gate, such as gate 12 4 or gate 12s, 
either one of these previous stages if kept the same size may be unable to switch gate 12i 
1 0 quickly enough to maintain timing constraints in the circuit 1 0. 

Similarly, if the size of each receiver gate 122 and receiver gate 12 3 are made 
smaller, then each of their corresponding output resistance is made smaller due to the 
decreased channel widths in each of these gates. Accordingly, when either of gate 122 or 
15 gate 123 is a driver gate for a subsequent receiver gale, such as gate 126 «ud gate 127, 
neither of gate 12 2 or gate 12 3 may be unable to switch the subsequent stage if kept the 
same size quickly enough to maintain the timing constraints in the circuit 10. 

In addition to the switching speed between the driver gate and each receiver gate, 
20 there also exists a timing delay through the driver gate. Since switching speed is an 
inverse of delay, a total delay, r , between the input of a driver gate to the input of each 
receiver gate may be expressed as 

wherein Kis a delay constant through the driver gate, R is the output resistance of the 
driver gate and C in is the input capacitance of the receiver gate. It is apparent from the 
25 above discussion that the delay constant K and the output resistance R are dependent 
upon the driver gate size, x drv and each input capacitance C in is dependent upon the 

receiver gate size x r . 
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It is readily seen from Eq. (1) that when selecting the size for each one of the 
gates 12i.7 in the circuit 10 to meet timing constraints, several parameters need be 
considered. In a simple circuit, such as the exemplary circuit 10, the selection of gate 
size for each of the gates 12i-7 may be accomplished without much difficulty. However, 
5 even an optimization of timing obtained for each possible combination of sizes for the 
gates 12i-7 could be unduly burdensome should many such sizes exist for each of the 
gates 12i-7. Because of the interdependencies each of these parameters have in regards to 
timing within each gate and between gates, it may be appreciated by those skilled in the 
art that as the number of gates in an integrated circuit increases, the complexity of 
1 0 selecting gate sizes for each of the gates 1 2 1-7 accordingly increases. 

In the prior art, the design of a complex integrated circuit is generally defined by a 
netlist, which is a set of data used by design automation tools. The problem of 
determining the optimal size of each instance of a gate in the netlist has been addressed 
15 by analyzing the slack on all of the endpoints in a circuit. As is known, slack is the 
difference between the required time and arrival time at the endpoint. If the arrival time 
is later than the required time, the difference is negative. Accordingly, negative slack on 
an endpoint indicates that the timing requirement is not met at that endpoint. Conversely, 
negative slack indicates that the actual delay on the path exceeds the required delay. 

20 

It then follows that the worst slack, WS , of a circuit with a set, P , of paths, p , 
may be expressed as a difference of path delay, PathDelqy(p) , and required delay 
RequiredDelay{p) , or: 

WS^-maK(PathDelay(p)-RequiredDelay(p)). (2) 

peP 

The path delay, PathDelay(p) , on any path, p , is in turn defined as the sum of each 
25 timing edge delay, delay(e) , on all of the timing edges, e , in such path, or 

PathDelay(p) = £ delay (e) (3) 

eeP 

wherein a timing edge, e , transitions at an input of a driver gate and extends to the input 
of a receiver gate, as best seen in Fig. 1. Since the delay, delqy(e) , on each edge, e , is 
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then know to be dependent upon the size of the driver gate and each receiver gate, as 
described above with reference to Eq. (1), the size of each of the gate instances in the 
netlist can thus be selected to optimize slack. Accordingly, the gate sizing for slack 
optimization can be expressed as finding a vector of gate sizes x in a solution space, X , 
5 that minimizes a negative value of the worst slack (Eq. 2), or - WS , or 



In a typical netlist, a solution to the min/max problem of Eq. 4 is extremely 
difficult to obtain due to the large number of paths and number of gate instances in the 
netlist compounded by all of the possible combinations of gate sizes for each of the gate 
10 instances. For a typical netlist, a solution to Eq. (4) may not be readily obtainable in a 
reasonable time. 

The problem may be refined by considering paths in the netlist that have worse 
slack than other paths, since these paths are more critical to optimize than the others, and 
15 assigning weights to timing edges in these paths. The timing edge in each of these paths 
for which its slack is the worst slack in its path may be assigned the largest weight in the 
path. Similarly, the timing edge having the worst slack in the path having the worst slack 
of all paths may generally be assigned the largest of all weights, 

20 For example, in Fig. 1 a first path may terminate at an endpoint 14 and a second 

path may terminate at an endpoint 16. The slack, slk , at the endpoint 14 on the first path 
is exemplarily indicated as positive, or slk>Q, showing that timing constraints are met 
such that the path delay is less than the required delay. However, the slack, slk , at the 
endpoint 16 is exemplarily indicated as negative, or slk<0, showing otherwise. A 

25 timing edge 18 transitioning at the input of gate 12$ and terminating at the input of gate 
127 may exemplarily be identified as contributing the worst slack on the second path. 
Accordingly, the timing edge 18 will receive the largest weight. The weights allow the 
timing edges with the largest weights to be optimized in favor over the edges with 
relatively lesser weights. However, it can be readily appreciated by those skilled in the 
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art that obtaining a direct solution to the min/max problem of Eq. 4 for various 
combinations of gate sizes along each timing edge in the typical netlist, even when first 
considering the most critical edges first, remains computationally intensive. 



5 As taught in Chen, et aL, Fast and Exact Simultaneous Gate and Wire Sizing By 

Lagrangian Relaxation, Proceedings of the 1998 IEEE/ACM International Conference on 
Computer Aided Design (ICCAD-98), pp 617-624, ACM/IEEE, Nov. 1998, the min/max 
problem of Eq. 4 may be solved in a continuous domain after being converted to the 
following form using Lagrangian relaxation: 



10 



max 



min 



( \ 

^w(e)delay(e) 



(5) 



wherein E is a set of edges, e , in the timing graph, and w(e) is a weight associated the 
timing edge, e . As described in Chen, et aL, the set of weights, w , on the edges, e , at in 
the timing graph must satisfy a unit flow condition, i.e., at any node in the timing graph 
the sum of weights on all incoming edges must equal the sum of weights on all outgoing 
edges. Accordingly, it can be seen from the teachings of Chen, et aL, in Eq. (5) that the 
15 problem of finding a set of gate sizes, x, that minimizes worst slack, as set forth in Eq. 
(4) becomes a problem of finding a set of gate sizes, x , that minimizes a sum of the 
weighted delays expressed in Eq. (5) as follows: 

> * (6) 



min 



^w{e)delay{e) 



The use of the minimum sum of weighted delays to optimize gate size can 
20 qualitatively be set forth with reference to Fig. 1. For example, a first timing edge 22 
transitioning at an input 20 of gate 12j and extending to the input of gate 122 has a weight 
w(^) and a second timing edge 24 also transitioning at the input of gate 12i but 
extending to the input of gate 123 has a weight w{e^ ) . Since the second timing edge 24 
is in the path terminating at endpoint 16 and further since this path has a higher criticality 
25 as set forth above, the weight w(e 24 ) on timing edge 24 is therefore larger than the 
weight w(e n ) on timing edge 24. To minimize the sum of weighted delays as set forth 
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in Eq. (6), the delay on timing edge 24, delay(e 24 ), would need to be minimized to 
minimize the weighted delay product in Eq. 6 for this edge 24. 

From the above discussion, the delay, delay(e 2i ) , on timing edge 24 is known to 
5 be a function of the size of gate 12i and a total input capacitance, C m , which is the sum 
of each input capacitance, C in , of gate 12 2 and gate 12 3 and a wire capacitance of the net 
between gate 12 2 and gate 12 3 . Accordingly, the delay as expressed in Eq. (1) can be 
rewritten for any timing edge, e , as a fonction of the driver gate size, x drv , and total 

receiver capacitance, C i01 , as 

delay(e)=f{x drv ,C tot ). (7) 

10 It can therefore be seen, that the minimization of the sum of weighted delays, as set forth 
in Ea. (6\ is dependent on gate size such that gate sizes can be obtained which minimizes 
delay on the heaviest of the weighted edges. 

Although the weighted delay gate sizing, as set forth in Eq. (5) is easier to solve 
15 than the min/max problem set forth in Eq. (4), the solution to Eq. 4 is in the continuous 
domain, Le., the solution is a continuum of gate sizes for each gate and does not result in 
a set of gate sizes that are obtainable from a library. Accordingly, Eq. (5) cannot be used 
directly for the standard cell methodology, in which for each gate instance in the netlist, 
one or more discrete gate sizes are available for selection, as discussed above. However, 
20 standard cell methodology is the primary methodology used for the design of complex 
integrated circuits, especially application specific integrated circuits (ASIC's) and it is, 
therefore, highly desirous to obtain a gate sizing solution in this methodology that 
minimizes as sum of weighted delays for gate size optimization. 

25 In order to solve the more practical discrete gate sizing problem, it is known in the 

art to first obtain a solution to Eq. (5) in the continuous domain and then use such 
solution as a starting point to obtain a solution in the discrete domain. Typically, the 
entry into the discrete domain is to round off the results of the continuous domain, which 
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may disadvantageous^ lead to a result, instead of minimizing delay on a critical path, 
could actually result in increased delay on such path. 

For example, in Fig. 2 (Prior Art), there is shown a portion of the circuit 10 of 
Fig. 1, including gates 12i. 3 , as described above. Each gate 12i. 3 is a member of a logic 
family, LogicFamily, and in each logic family, several gate sizes, x gole , are available 

from the library, such that 

x gale e LogicFamily . (8) 

The logic family for gate 12i is exemplarily shown as having three discrete sizes, 



*12, =< 




(9) 



shown as gate 12J" 1 , gate 12f =2 and gate 12*~ 3 , and the logic family for gate 12 3 is 
1 0 exemplarily shown as having two discrete sizes, 



Xl2 >-\ X x = 2\ 



(10) 



shown as gate 12 X * and gate 12 J" 2 . It is to be understood that each gate instance may 
have any number of discrete sizes. A solution may then be obtained for Eq. (5) in the 
continuous domain, and data obtained relating specifically to the continuous sizes of gate 
12i and gate 12 3 on timing edge 24. 



With further reference to Fig. 3 (Prior Art), there is shown a graph for the 
continuous domain solution with continuous sizes x xl ^ for gate 12 3 on the ordinate and 

continuous sizes jc, 2i for gate 12i on the abscissa. The data obtained for an exemplary 
solution to Eq. 5 for timing edge 24 may result in a series of contours 26 about a locus 
20 28. The locus 28 represents an optimal solution for gates sizes x 12 and x 12 in the 

continuous domain, and the contours 26 represent increasingly less desirable solutions for 
each increasing size of the contours 26 outward from the locus 28. 
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Superimposed on the graph of Fig. 3, for gate 12i and gate 12 3 are their discrete 
gate sizes x x2{ and x l2 ^ , as respectively set forth in Eq. (9) and Eq. (10). As set forth 

above a continuous domain solution is used to enter the discrete domain by rounding off 
5 the optimal gate sizes, as indicated at the locus 28, to the nearest discrete gate sizes. As 
visually indicated in Fig. 3, the nearest round-off point for the discrete sizes x, 2j and x l2 ^ 

from the locus 28 is at a data point 30 at which gate 12i has a discrete size x, 2j = 1 and 

gate 123 has a discrete size x n ^ = 2 . 

10 It can readily be seen in the graph of Fig. 3 that to reach the nearest round-off 

point at data point 30, five of the contours 26 are crossed and that the contours are closely 
spaced. Accordingly, the slope of the continuous domain solution to Eq. (5) is relatively 
steep between the locus 28 and data point 30 and, as stated above, each contour 26 farther 
away from the locus 28 indicates an increasingly less desirable continuous domain 

15 solution. 

A more preferable solution for this example would be at a data point 32 at which 
gate 12i has a discrete size x l2i =2 and gate 123 has a discrete size x n ^ = 2 . As seen in 

the graph of Fig. 3, the slope between the bcus 28 and data point 32 is far lesser in that 
20 only two contours 26 are crossed. However, since the discrete size discrete size x l2[ = 2 

for gate 12i is farther from the locus 28 than for its smaller size, the rounding used in the 
prior art would not select the more preferable size. 

The discrete size x l2i = 2 for gate 12i as more preferable is also apparent from 

25 the above example described in reference to Fig. 1. Since timing edge 24 is on the path 
terminating at endpoint 16 (Fig. 1), and this path was indicated having a higher criticality, 
delay on timing edge 24 would be reduced if the larger size for gate 12) were used 
instead of the smaller size suggested by rounding of the continuous domain solution. 
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Summary of the Invention 

According to the present invention, a method to select a set of gate sizes for a 
netlist having a plurality of gates wherein for each of the gates a number of discrete gate 

5 sizes is available for selection such that the selection minimizes worst slack in the netlist 
includes the steps of selecting a current first gate size for each one of the gates, 
performing a static timing analysis to determine slack, assigning a current weight to each 
one of the timing edges in the netlist based on the results of the timing analysis, selecting 
a new gate size for each one of the gates from one of the current gate size and a second 

10 gate size from the available gates sizes wherein such selection of each new gate size 
minimizes a sum of weighted delays obtained over all timing edges, and re-iterating each 
of the forgoing steps until an exit criteria is determined. 

At an initial iteration of the current gate size selecting step the current gate size is 

1 C <u&l<was4 */\ Ua o« inif JalK/ ccAnrtoA rttv» nf tliA available oafp ciwc nr\A at f»ar.h snhspniiftnt 
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iteration of the selecting step the current gate size for each of the gates is the new gate 
size for each corresponding one of the gates from an immediately prior iteration. In each 
iteration, the current weight assigned to each edge may be determined from a current 
worst slack determined from the timing analysis using the current gate size. In one 
20 particular embodiment of the present invention, the second gate size alternates between a 
next larger size and a next smaller size in successive iterations. The set of gate sizes 
selected from the forgoing method is the set from the iteration for which the current worst 
slack is determined to be minimal. 

25 In one aspect of the present invention, a method to obtain the minimum sum of 

weighted delays in the netlist for a set of gates wherein for each gate only the first gate 
size and the second gate size are considered includes defining for the netlist an equivalent 
flow graph, computing a value of a first attribute for each node in the flow graph wherein 
each node corresponds to one of the gates in the netlist, and computing a value of a 

30 second attribute for each arc between a pair of nodes in the flow graph wherein each arc 
corresponds to the timing edges between each pair of gates to which the pair of nodes 
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corresponds. The second attribute is assigned as a flow capacity for the arc for which it 
was computed. The method continues with placing a source arc between a source node 
and each node for which its first attribute is positive and placing a sink arc between a sink 
node and each node for which its first capacity attribute is negative. For each source arc 

5 its flow capacity is assigned the computed value of the first attribute of the node to which 
it is placed, and for each sink arc its flow capacity is assigned the negative of the 
computed value of the first attribute of the node to which it is placed. The method further 
continues with partitioning the flow graph into a source partition and a sink partition such 
that a sum of the value of the flow capacity on all arcs cut by the partitioning is a 

10 minimum sum for all possible partitions, the method concludes with selecting for the set 
of gates sizes the first gate size for each of the gates for which its corresponding node is 
in the source partition and the second gate size for each of the gates for which its 
corresponding node is in the sink partition. 

15 In the above method, the value of the first attribute for each node is determined 

from an assigned weight and a plurality of delay coefficients, described below, associated 
with each of the timing edges incoming to and outgoing from one of the gates to which 
each node respectively corresponds. Similarly, the value of the second attribute for each 
arc between a pair of nodes is determined from the assigned weight and selected ones of 

20 the delay coefficients for each one of the timing edges between a pair of gates to which a 
pair of nodes corresponds. 

The delay coefficients associated with each of the timing edges are determinable 
from a plurality of calculated delays between a driver gate and a set of receiver gates for 
25 each combination of the driver gate being one of the first gate size and the second gate 
size and the set of receiver gates all being one of the first gate size and the second gate 
size. 

As described above, the min/max path delay expression of Eq. (4) is limited in its 
30 application to typically sized netlists due to the number of gates and the number of 
discrete sizes for each of the gates available from libraries. When considering every 
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possible combination of gate sizes, the time required to reach a solution may 
disadvantageous^ be so excessive such that a solution may not be possible in a 
reasonable time. 

5 Also as described above, the continuous minimum sum of weighted delays 

expression of Eq. (5), although solvable in a reasonable time, is limited in its application 
to discrete sizes available from a library. When rounding a continuous solution for a 
driver and receiver gate on a timing edge, the rounding may disadvantageous^ select a 
less preferential size of driver and receiver which may further increase delay on a critical 
10 path. 

The present invention overcomes the above described disadvantages and 
limitations of the prior art by providing a novel discrete gate sizing method in which the 
minimum sum of weighted delays expression is used to solve a discrete domain problem 
15 through a reiterative nrocess that considers onlv two sizes for each 0**te instance in each 
iteration. A feature of the present invention is that the reiterative process has an inner 
loop process and an outer loop process. The inner loop process is performed for each 
iteration of the outer loop process. 

20 In the inner loop, the continuous minimum sum of weighted delays when 

considering only two possible gates sizes for each gate becomes solvable as a well known 
min-cut/max flow solution that is readily obtained in real time and directly applicable to 
the discrete domain. A feature of the inner loop is that for each gate after a solution is 
obtained, each gate in the netlist will be one of either of the two sizes. 

25 

In the outer loop, a starting set of gate sizes and weights for each timing edge are 
assigned. One feature of the outer loop is that the starting set of gate sizes relates to the 
solution set of gate sizes of the inner loop of a prior iteration. Another feature of the 
outer loop in another embodiment of the present invention is that the weight assigned on 
30 each edge in each iteration is refined based on the weight of the prior iteration such that 
the reiterative process converges quicker to a preferred solution. 
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The present invention is able to optimize delays on critical paths by using the 
minimum sum of weighted delay expression, but advantageously apply it to the discrete 
domain by transfonning the netlist into an equivalent flow graph for which optimization 
5 is readily obtained using well known min-cut/max flow algorithms. One particular 
advantage is that the partitioning of the flow graph to find the optimum gate size is 
readily achievable in a reasonable time proportional to N 3 or N 2 E time wherein N and E 
are number of gates and number of edges, respectively. 

10 These and other objects, advantages and features of the present invention will 

become readily apparent to those skilled in the art form a study of the following 
Description of the Exemplary Preferred Embodiments when read in conjunction with the 
attached Drawing and appended Claims. 

1 <\ Rtn^f rW/trintinrt nf tha TtiHktvina 

Fig. 1 (Prior Art) is a block diagram of an exemplary circuit usefiil to describe 
prior art gate sizing methods; 

Fig. 2 (Prior Art) is a portion of the circuit of Fig. 1 showing available discrete 
20 gate sizes for each gate; 

Fig. 3 (Prior Art) is a graph showing an exemplary solution to a continuous 
domain minimum weighted sum of delays as applied to two gates of Fig. 2; 

Fig, 4 is a flowchart of a novel method in which a minimum sum of weighted 
delays solution is transformed into a solution of a min-cut/max flow problem; 
25 Fig. 5 is an exemplary flow graph defined in the flow graph defining step of Fig. 

4; 

Fig. 6 is a flowchart of the method to calculate the attributes of the attribute 
computing steps of Fig. 4; 

Fig. 7 is a flowchart of the arc placing step of Fig. 4; 
30 Fig. 8 is a flowchart of the gate size selecting step of Fig. 4; 
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Fig. 9 is a flowchart setting forth the a novel gate sizing method according to the 
principles of the present invention; 

Fig. 10 is a flowchart of the current gate size selecting step of Fig. 9; and 
Fig. 1 1 is a flowchart of the current weight assigning step of Fig. 9. 

5 

Detailed Description of the Exemplary Preferred Embodiments 

Referring now to Fig. 4, there is shown a flowchart 40 of a novel method to select 
a set of gate sizes for a netlist wherein for each one of the gates only one of a discrete 
10 first gate size and a discrete second gate size is available for selection such that the 
selection minimizes a sum of weighted delays over all timing edges in the netlist. As will 
become readily apparent from the following description, the solution to the minimum 
sum of weighted delays is transformed into a solution of a min-cut/max flow problem. 
Thus, flowchart 40 relates to the broadest aspects of the inner loop of the reiterative 

1 5 nrncess desnrihftH ahnve 

JT " ' ~ ' 

To describe the transformation of the solution of the minimum sum of weighted 
delays in the netlist, which may be any netlist having N number of instances, insts , of 
gates, it is first assumed that the current size of all gates in the netlist is initially the first 
20 size, represented as S = 0 and that for each gate only the first size or the second size can 
be used such that any gate that is resized assumes the second size, represented as S = 1 . 
Accordingly, for each j lh gate in the netlist its size S. may be defined as 




The method of the present invention, practiced in accordance with flowchart 40, 

25 will result in a set of gate sizes, S , wherein 

iS = {Sj,5 2 ,...,5^,...,S' Ar }, 02) 

which minimizes a sum of weighted delays, as set forth in Eq. (6), which when combined 
with Eq. (12) may be re-written as: 
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s *M n \£e J (13) 



From Eq. (1) and Eq. (7), it follows that the delay, delay(e) , on each timing edge 
as set forth in Eq. (13)can be expressed as 

f(x d „,CJ=Kx diy +RC tor (14) 
As stated above in conjunction with Eq. (1) and Eq. (7), the coefficients K and R are 
dependent upon the size of the driver gate for the timing edge, and that C m is the sum of 
the input capacitance, C in , of each receiver gate on each outgoing edge from the driver 

gate and a wire capacitance on the net between the driver gate and receiver gates. The 
input capacitance, C in , is also dependent upon the size of its receiver gate, as stated 

above. Accordingly, the delay on each edge can be expressed as a function of driver size 
and the size of the sei of receiver gaies, such thai ihe delay, deiay{e) , can be expressed as 



1A a fiinr*ti/\« /vf C etc 



15 



delay(e)=f(S drv J r ) 9 (15) 

wherein is the size of the driver gate and S r is a vector of sizes of each receiver 

gate, r, in the set of receiver gates, rec, on the net, h, comprised of each outgoing 
timing edge from the driver gate, such that r € rec and n e nets , wherein nets is the set 
of all nets in the netlist. 

From Eq. (14) and Eq. (15) and the description immediately above, and further 
given that the set of sizes S = {0,l} for all of the gates, Eq. (14) may then be rewritten as 



h^C comi + %S r AC Xk(S+)+*(S*)Zs,AC, , (16) 

^ rerec J rerec 

wherein C comt is the total capacitance, C tot , on the net for S r - 0 and AC, is the change 
20 of capacitance on the net for S r = 1 . The change of capacitance, &C r , on the net is 
expressed as the change of input capacitance of the receiver gate, r , since the wire 
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capacitance on the net is assumed to be constant and therefore does not contribute to the 
change of capacitance on the net. 

It is readily apparent from Eq. (15) and Eq. (16) that for each timing edge there 
5 are four cases of delay such that 

rf<%(0,0) = £(0), (17) 

delqy(l 9 0) = K(l), («) 

delay(Q,\) = K(0) + £(0)£AC, , and (19) 

rerec 

delayiU) = K(l) + R(l) £ AC, . (20) 

10 Eq.'s (17)-(20) can be rewritten to obtain expressions for each of the delay 

coefficients, £(0), £(1), fl(0),and as Mows: 

A"(0) = ^/ay(0,0), (21) 

K(\) = delay(\,0), (22) 



2>c, 



(23) 



15 



I AC, • (24) 



Having obtained expressions for the delay coefficients, Eq. (16) can be written as 
delayiS^l^KiS^+RiS^SAC, , (25) 

rertc 

or 



-16- 



Patent 
Dkt N . 00005-013 

=4>) + (4)- 4)))^ 

+m+m-mKiL*cA- (26) 

renc 

Eq. (26) can be expanded and the resultant quadratic term S dn> S r algebraically converted 
knowing 5 6 |b,l} which infers 5 2 = 5 , and using the following expressions 

fo-S^tf+S, 2 ^, (27) 

1^-5,1 = 5, +5,-25,5,, and (28) 

5^=^+5,-^-^1), (29) 
5 such that Eq.26 becomes 

delayiS^l) « K(o)+ (K^-K^S^ 
+ *(0)2>C,5, 



r&rec 
rerec 

-0.5£(if(0)-i?(l))AC r 5 r 

rerec 

0.5£(i?(0)-4))AC r |5 rf „-5 r | 



(30) 



rerec 



Eq.(26) can now be substituted for the sum of weighted delay expression in 
Eq.( 13) wherein 



"jTw(e)delay(e)= £ \Zw(e)delay(S d „>S r ) 

eeE nenett \ ee» 

and Eq. (30) substituted into Eq. (31) wherein 



(31) 



ZMe)delay(e)= ]!>>,5; + Z B u\ S >~ s j\ +Const (32) 

eeE jeinsts ijeinsts 

10 wherein Aj is a first attribute associated with each j* one of the gates and B u is a 

second attribute associated with each timing edge between each i A one and j* one of the 
gates. 
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The derivation of A s and 2?, y in Eq. (32) resulting from the substitution of Eq, 
(30) into Eq. (31) is within the ordinary skill in the art. In Eq. (30), it is seen that all 
subexpressions are in the form of aS } and /fjs^ wherein a and J3 are known 

constant values for a given timing edge e . The expressions for A j and B u in Eq. (32) 
5 are obtained by summing the corresponding a and expressions in Eq. (30). 
Therefore, it is seen that each of the first and second attributes Aj and B u is a function 
of the above described delay coefficients and weight for each timing edge. 

More particularly, it is to be noted that each a that contributes to Aj is associated 
10 with either S drv or S r . Accordingly, each A s associated with each j* one of the gates 
has a component when such gate is a driver gate and when such gate is a receiver gate. It 

_ A t_ i — j A i a. f- i_th . .-i.. *a. ^ .j t*.*% * a • 

iiia)' vuiiYviiiviu.ijr itpACo^incu uiai iui ^aun jv uuc ui mt g<ucd its nisi aiuiuutc s± k 15 

expressible as a sum of a first increment 4- ncr associated With each respective one of the 
outgoing timing edges from the k* gate when the k* gate is an i* driver gate, and a 
15 second increment Af r associated with each respective one of the incoming timing edges 
to the k A gate when the k* gate is a j* receiving gate such that 

Af cr =w(e)(K(l) - K(0)) - W(R(0) - *(1))AC, 12 (33) 

and 

Af r =W(R(0) + *(l))AC y / 2 . (34) 
wherein w(e) is the weight on each one of the timing edges from an i* driver gate to a 
receiver gate, W is the sum of assigned weights w(e) on all outgoing timing edges from 
20 the i* driver gate and AC, is a difference in input capacitance between the second size 
and the first size for the j* receiver gate. 

From summing the /? expressions in Eq. (30), the attribute B tj may be expressed as 
B u =W(R(0) - J?(l)) AC, 1 2 . (35) 
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Eq. (32) is then seen as an expression for the sum of weighted delays in Eq. (13) 
expressed as a function of gate size when only two sizes for each of the gates are 
considered. By substituting Eq. (32) into Eq. (13) the expression for the minimum sum 
of weighted delays becomes 



nun 



I^Mill (36) 

^jeinsts ijeimts j 



It is the minimum sum of weighted delays, set forth in Eq. (36), for which the method of 
the present invention set forth in the description below of the flowchart 40 obtains a 
solution. 



With continued reference to Fig. 4 and additional reference to Fig. 5, the method 
10 of flowchart 40 (Fig. 4) includes a step 42 of defining for the netlist an equivalent flow 
graph 44 (Fig. 5). The flow graph 44 has a plurality of first nodes 46i, ... 46 f , 46j, ... 
46n, a plurality of first arcs 48i,i, ... 48y, ... 48j,N, a source node 50, a plurality of source 

/-/n. . * 1 J_ CA J ~ ~1 ~C ~~~~ CC . . Co^Vi 

arcs ^Zsrch -^src,i> a ^uiN nuuc j*t oiiu a piuiauv/ vi ouuv wva ^yj^sqk, ... */v/is,snK. ^«vu 

first node 46* corresponds to a respective i A gate instance in the netlist and each first arc 
15 48y between an i* node 46i and a j* node 46j corresponds to a respective timing edge e 
between an i* gate instance and a j* gate instance in the netlist. 



In the event the i* gate instance has two inputs (or more), such as gate 12i (Fig. 
1), two (or more) timing edges exist to the j* gate instance, such as gate 122, since a 
20 timing edge transitions at each respective input of the i* gate instance. It is to be 
understood that the flow graph 44 contains only one first arc 48y between the i* first 
node 46j and the f first node 46j. 

It is known that associated with each of the arcs in a flow graph, such as flow 
25 graph 44, a numerical value of a flow capacity is assigned to each arc. The description of 
the following steps of flowchart 40 describes the computation and assignment of the flow 
capacity to each arc. 
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The method of flowchart 40 further includes a step 58 of computing a numerical 
value of the first attribute A i for each i ft first node 46i and a step 60 of computing a 

numerical value of the second attribute B tj for each first arc 48y. In the above 

derivation of the minimum sum of weighted delays, expressed in Eq. (36), the first 
5 attribute A i was associated with the i* gate instance and the second attribute B u was 
associated with the timing edge between the t gate instance and the j* gate instance. 
The first attribute A i associated with the i* gate instance and second attribute associated 

with the timing edge between the i A and gate instances can now have a numerical 
value associated with each i A first node 46* and each first arc 48y, respectively, because 
10 of the above stated relationships between nodes and arcs in the flow graph 44 and gate 
instances and timing edges in the netlist. 

In The broadest aspecis of the present invention, the value of the first attribute A i 
is Heterminflhlp frnm an assigned weight and numerical values of a plurality of 

15 delay coefficients on each timing edge e outgoing from and incoming to an 1 th gate 
instance corresponding to each i* first node 46*, wherein the value of the delay 
coefficients is obtained for each case of delqyiS^,^) on each timing edge e. 
Similarly, the value of the second attribute B t j for each first arc 48y is determinable 
from the weight w(e) on the corresponding timing edges e from the i* gate instance 

20 corresponding to each i* first node 46* and the numerical value of selected ones of the 
delay coefficients on the corresponding timing edge between the i* gate instance and the 
j* gate instance corresponding to each j* first node 46j. 

As stated immediately above, the value of the delay coefficients is obtained for 
25 each case of delay(S dn , S r ) on each timing edge e in the netlist from an i* gate instance. 

Since four cases of delay(S dry ,S r ) exist, the delay coefficients may, in one embodiment 

of the present invention, specifically include a first coefficient, a second coefficient, a 
third coefficient and a fourth coefficient as set forth immediately below. 
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The numerical value of the first coefficient is proportional to the delay 
delay(e) on the timing edge e from the gate instance for the case delay(0 9 0) , when the 
size of the driver gate is the current or first size, S drv = 0 , and the size of the set of 
receiver gates is the first size, S r = 0. Accordingly, in a preferred embodiment of the 
5 present invention, the numerical value of the first coefficient may be computed from the 
expression of K(0) set forth above in Eq. 21 . 

The numerical value of the second coefficient is proportional to the delay 
delay(e) on the timing edge e from the i* gate instance for the case delay(\fi) , when the 
10 size of the driver gate is the second size, S drv = 1 , and the size of the set of receiver gates 
is the first size, S r = 0 . Accordingly, in a preferred embodiment of the present invention, 
the numerical value of the second coefficient may be computed from the expression of 
£(1) set forth above in Eq. 22. 

15 The numerical value of the third coefficient is proportional to a difference 

between the delay delay(e) on the timing edge e from the i* gate instance for the case 
delay(Q,l) , when the size of the driver gate is the current or first size, S drv = 0 , and the 
size of the set of receiver gates is the second size, S r = 1, and the delay delayie) for the 
case <fe/qy(0,0), when the size of the driver gate is the current or first size, S dry =0, and 

20 the size of the set of receiver gates is the first size, S r = 0, this difference being divided 
by the change of input capacitance on the net seen from the i* gate instance between the 
size of set of receiver gates being the second size and the first size. Accordingly, in a 
preferred embodiment of the present invention, the numerical value of the third 
coefficient may be computed from the expression of R(0) set forth above in Eq. (23). 

25 

The numerical value of the fourth coefficient is proportional to a difference 
between the delay delay(e) on the timing edge e from the i* gate instance for the case 
delay (1,1) , when the size of the driver gate is the second size, S drv = 1 , and the size of the 
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set of receiver gates is the second size, S r = 1 , and the delay delay(e) for the case 
delay(lfi) , when the size of the driver gate is the second size, S drv = 1 , and the size of 
the set of receiver gates is the first size, S r = 0 , this difference being divided by the 
change of input capacitance on the net seen from the gate instance between the size of 
5 set of receiver gates being the second size and the first size. Accordingly, in a preferred 
embodiment of the present invention, the numerical value of the fourth coefficient may 
be computed from the expression of R(\) set forth above in Eq. (24). 

As described above, the first attribute A k at any gate instance includes the 

10 summation of each A™ r set forth in Eq. (33) for each outgoing timing edge from the k th 

gate instance, being a driver gate, and the summation of each A'" cr set forth in Eq. (34) 

for each incoming timing edge to the k* gate instance, being a receiver gate. Since the k* 
first node 46k corresponds to the k* gate instance, the numerical value of the first 
attribute A k associated with the k m first node 46k may be computed from an expression 

15 for A k associated with the k* gate instance. Accordingly, in a preferred embodiment of 
the present invention, a numerical value of the first increment A k for the \P first node 
46k may be computed from the expressions for A l " cr and A'" cr set forth above in Eq.'s 
(33) and (34), respectively, wherein at any k A first node 46k, the total sum of its 
incremental values A™ cr obtained when such k* first node 46k, was an i* first node 46i 

20 (corresponding to the i* driver gate instance) are summed together with a total sum of its 
incremental values Af obtained when such k* first node 46 k , was a f first node 46j 
(corresponding to the j* receiver gate instance). 

Similarly for reasons as described immediately above, a numerical value of the 
25 second attribute B i } for each first arc 48y may, in a preferred embodiment of the present 

invention, be computed from the expression for B Kj set forth in Eq. (35). When the first 

arc 48y corresponds to multiple timing edges between the i A gate instance and j* gate 
instance, the expression of Eq. (35) is used to obtain an incremental value for each such 
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timing edge and each incremental value summed to obtain the value of the second 
attribute B f j for each first arc 48y. 

Referring now to Fig. 6, there is shown a flowchart 62 that sets forth a preferred 
5 re-iterative method for computing the value of the first attribute A i at each I th first node 

46j and the value of the second attribute B t j for each first arc 48y, as generally set forth 

above in the description of steps 58 and 60 of Fig. 4. The method of flowchart 62 is 
iterated from i = 1 to N for each i* first node 46* and, within each f 1 iteration, an iteration 
is performed for each j* first node 46j on each first arc 48y between the i* first node 46i 
1 0 and each first node 46j. 

At each i* iteration the method of flowchart 62 includes a step 64 of calculating a 

_— i -r j~i — j_7_. ./^\ u a.: : ~ J — - a u: — : £ al„ 

iiuiiivi li/cu vaiuc ui uiv uviaj uctuf \c>j kjil vavii luiuixg tu^v c uaiioiuuiiui^ nuin uiv 

enrresnonding 1 th gate instance for each ca.se of delav(S . . S } . Each ca.se of the delav is 

15 preferably calculated from library timing models for each of the first and second gate 
sizes of the i driver gate instance and the set of j receiver gate instances. The 
calculation of each case of delay, preferably using the expressions of Eq.'s (17)-(20), 
results in four numerical delay values: dOO = delay (0,0) , dOl = delay (0,1) , 
d\0 = delay(l,0) and dl 1 = delay(\,l) associated with each i th iteration. 

20 

The method of flowchart 62 farther includes, at each i* iteration, a step 66 of 
calculating a value of each of the first, second, third and fourth delay coefficients for each 
timing edge e transitioning from the i* gate instance. The values of the first through 
fourth delay coefficients are preferably calculated using the expressions of Eq.'s (21)- 
25 (24), respectively, and the calculated value of delays from step 64 above. As described 
herein, £ AC r = AC tot . Accordingly, the calculation of the first, second, third and fourth 

rerec 

delay coefficients results in four delay coefficient values: K0 = dQ0, £l = dl0, 
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R0 = (d0\-d00)/AC fo( and Rl = (dll-dlO)/ AC tot for each i* iteration, wherein, as 
described above, AC /0 , = ^AC r . 

rerec 

th 

At each i iteration, the method of flowchart 62 further includes a step 68 of 
5 calculating a value of the first increment A\ ncr described above for each i* first node 46i. 

The value of A] ncr is preferably computed from the expression of Eq. (33) and from the 

values of the calculated delay coefficients obtained in the present i* iteration of step 66 
above . More particularly, within each i* iteration, the calculated value of the first 
increment A™ cr computed on the iteration for each j th node results in a value of 

10 A™ =w(e)(K\-K0)-W(R0-Rl)kC J /2, and the value of each A* cr from each 

iteration for the j ft node is accumulated within the present i A iteration to obtain the 
resultant value of the first attribute A i for the i m first node 46(. 

Also during the present i* iteration, at each iteration for each j* node the method 
15 of flowchart 62 further includes a step 70 of calculating a value of the second increment 
A'j Cr described above for each j* first node 46j. The value of Aj cr is obtained the 

expression of Eq. (34) and from the values of the calculated delay coefficients obtained in 
step 66 above for the present i* iteration. Accordingly, in the present ith iteration, the 
calculated value of the second increment A ! " cr computed at each iteration for the j* first 

20 node 46j results in a value of Aj cr =W(RQ + Rl)ACj /2, and the values of Aj cr 
computed in the present i* iteration is accumulated with any other value of Af r for the 
j* first node 46j from any k* iteration of the method of flowchart 62. As stated above, 
the values of the first increment A'" cr and the second increment Aj cr accumulate at each 
first node 46 so that a resultant value of the first attribute A k accumulates at any first 

25 node 46k. 
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In the present ith iteration, the method of flowchart 62 further includes a step 72 
of calculating a value of the second attribute B tj for each first arc 48y outgoing from the 

current i m first node 46*. The value of B t ;J is obtained from the expression of Eq. (35) and 

the values of the delay coefficients obtained in step 66 above in the present i* iteration. 
5 When the first arc 48y corresponds to multiple timing edges between the gate instance 
and j* gate instance, the expression of Eq. (35) is used to obtain an incremental value 
= W{R0 - R\)ACj 12 for each such timing edge and each incremental value 

accumulated to obtain the value of the second attribute B Kj for each first arc 48y. 

10 Since in each 1 th iteration of the method of flowchart 62 the full value of the 

second attribute B i } has been accumulated, such method may at this time further include 

a step 74 of assigning each value of the second attribute B u calculated in step 72 in the 
present i* iteration as a flow capacity capacity(i 9 j) to each corresponding first arc 48y in 
the flow graph 44. Accordingly, capacity(i,j) = B t j . 

15 

At step 76 a determination is made, whether in the present i* iteration there is 
another j* first node 46j. If YES, step 68, step 70, step 72 and step 74 are reiterated for 
the next first node 46j. 

20 Otherwise, If NO, at step 78 a determination is made whether i<N. If YES, step 

64 and all subsequent steps of flowchart 62 are performed as above for the next i*+l 
iteration. If NO, the next step of flowchart 40 (Fig. 4) is performed. 

Returning to Fig. 4, the next step in the method of the flowchart 40 is the step 80 
25 of placing the source arcs 52^,1 and sink arcs 56i )Sn k in the flow graph 44. Generally, each 
source arc 52 sr c,i is placed between the source node 50 and each respective i th first node 
46j for which the accumulated value of its first attribute A i is positive. Similarly, each 
sink arc 56^ is placed between the sink node 54 and each respective i* first node 46i for 
the accumulated value of its first attribute A i is negative. The flow capacities, 
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capacity (source J) and capacity(i,sink) , are then assigned based on the value of the first 
attribute of the i* first node 46j. 

A preferred implementation of the step 80 is shown in Fig. 7. At step 82, the 
5 accumulated value A i is obtained for each i* first node 46i wherein i = 1 to N. At step 

84, a decision is made whether A i > 0 . 

If the decision at step 84 is YES, then at step 86 a source arc 52 src j is placed 
between the source node 50 and the X th first node 46i. The source arc 52^,1 is assigned a 
1 0 capacity capacity (source, i) = A x . 

If the decision at step 84 is NO, then at step 88 a sink arc 56j ?sn k is placed between 

:th c + — a~ ac n ~+A ~*a* za tu« :th . ^ „^<,\^^^a „ ™~™*+*t 

UIV X XXL Jk iiVUV I W ] MXXVX 111V >JJU JJV UVUV «/'■'• 111V A UU11V UIV ^ I.ST1K ilJ WUUI^HVU M. VUjpMVll.^ 

capacity(i, sin £) = -4 • 

15 

In either event, the method continues to step 90 whereat a decision is made 
whether i<N. If YES, an iteration for the next f node 46i commences at step 82. If NO, 
the next step of flowchart 40 is performed. 

20 The method of flowchart 40 fiirther includes a step 92 of partitioning the first 

nodes 46i, ... 46i, 46j, ... 46n into a source partition 94 and a sink partition 96, as best 
seen in Fig. 5. The partitioning is made by a cut, as indicated at 98, such that a sum of 
the value of the capacity on each of the source arcs 52^,1, ... 52 src> b sink arcs 56j, sn k, ... 
56N,snk and first arcs 48 ... 48y, ... 48j,N on the cut is a minimum sum for all possible 

25 partitions. Those skilled in the art will recognize that Eq. (36) is an equivalent to a min- 
cut/max flow problem for solvable using a Push-Relabel algorithm for which a solution 
may be found in N 3 or N 2 E time as in known in the art and specifically taught by 
Cherkassky, et. aL, On Implementing Push-Relabel Algorithm for the Maximum Flow 
Problem, Algorithmica, vol. 19, pp.'s 390-410, 1997. In a preferred embodiment of the 

30 present invention, a Push-Relabel algorithm is used to obtain the cut 98. 
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The method of flowchart 40 concludes with a step 100 of selecting the current, or 
first, gate size for each j* gate for which the corresponding j* first node 46j is in the 
source partition 94. The step 100 also includes selecting the second size for each j* gate 
5 for which the corresponding j* first node 46j is in the sink partition 96. The set of gate 
sizes resulting from this step 100 satisfies Eq. (35). 

Referring to Fig. 8, there is shown a preferred implementation of the step 100 
reiterated for j = 1 to N. At step 102, a decision is made whether the current f first node 
10 46j is in the sink partition 96. 

If the decision at step 102 is YES, then at step 104 the instance of the gate 
corresponding to the j* node 46j is selected to be the second size. Otherwise, if the 
decision is NO, then at step 106 the j m instance of the gate corresponding to the j l " node 
15 46j is selected to be the current or first size. 

In either event, the method continues to step 108 whereat a decision is made 
whether j<N. If YES, an iteration for the next f node 46j commences at step 102. If 
NO, the solution to Eq. (36) has been obtained. 

20 

From Eq. (36) it can be seen that the forgoing method has obtained the minimum 
sum of weighted delays for the two gate size problem. Since the cut 98 is made to 
minimize the sum of flow capacities on the cut arcs, and these flow capacities are 
assigned the values of B i }J for these arcs computed from the weight and delays on the 

25 corresponding timing edges, then it follows that the summation ^B u for the 
corresponding timing edges, by definition, is minimal. Similarly, the summation 
^AjSjis also minimal since only when Aj<0 is there a contribution to this 
summation. For positive values of A j the corresponding gate size is S y = 0 and 
therefore A j S j - 0 . 

30 
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Of course, a practical netlist uses libraries for gates for which there are more than 
two sizes. The following description sets forth a method in which the two gate size 
methods described above are applicable. Generally, a series of iterations using all the 
available gate sizes may be performed wherein only two of the gate sizes in each iteration 
5 are used as above. At the end of each iteration, a resultant set of gate sizes from the those 
two gate sizes that satisfies Eq. (36) is obtained. In the next iteration, all gate sizes from 
the prior iteration are resized either up or down, and the two gate size method described 
above is re-performed. When all possible gate sizes have been considered, or some other 
exit criteria determined over all such possible iterations, a set of gate sizes that satisfies 
10 Eq. (4) may be determined. 

Referring now to Fig. 9, there is shown a flowchart 1 10 of a reiterative process for 
a method to select a set x of gate sizes for a netlist having N number of gates wherein for 
each i m one of the gates a predetermined number of discrete gates sizes X i is available 

1 r r»_„ _ _t _ — t^l _ — _x* , — a~ ~* — :~ -i — ±~ — — — 4. „1 — i„ ±l„ 

netlist. Accordingly, the set x of gate sizes may be selected to satisfy the expression of 
Eq. (4). 

In each iteration of the method of flowchart 1 10 includes a step 1 12 of selecting a 
20 current first gate size X for each instance insts of the gates and an available second gate 
size for each instance. At an initial iteration of the selecting step 112, the current first 
gate size X for each instance is selected to be an initially selected one of the available 
library gate sizes. At each subsequent iteration of the selecting step 1 12, the current first 
gate size X for each instance is selected to be a resultant gate size for the same instance 
25 from an immediately prior iteration of the flowchart 110, as described below. The 
availability of the second gate size from the library for each instance is used in a 
subsequent step described below. 

After the current set x of gate sizes is selected, the method of flowchart 110 
30 includes a step 114 of performing a timing analysis and assigning a set of weights w. 
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Each current weight w(e) in the set of weights w is associated with a respective timing 
edge e in the netlist The timing analysis determines slack and worst slack in the netlist. 
As described above, the current weight w(e) is a function of a current worst slack 
determined for the netlist using the current first gate size. 

5 

At step 116, the method of flowchart 1 10 includes the step of selecting a new gate 
size X for each instance insts of the gates from the current first gate size and the second 
gate size identified above such that the set of new gate sizes obtains a minimum sum of 
weighted delays. When the current first gate size expressed as S = 0 and the second gate 
10 size expressed as 5 = 1, the step 116 is preferable performed in accordance with the 
above described method of Fig. 4 wherein the minimum sum of weighted delays is 
obtained as a solution to a min-cut problem using the two gate sizes. Accordingly, in one 
embodiment of the present invention, the set of new gate sizes resulting from the 
performance of step 1 16 satisfies the expression of Eq. (36). 

15 

At step 118 a decision is made whether an exit criteria has been reached. If NO, a 
next iteration of the above described steps of flowchart 1 lOwill be performed. In the next 
iteration of the flowchart 110, the new gate size of the present iteration selected above 
becomes the current gate size in the current gate size selecting step 112 of the next 
20 iteration. 

A YES decision at step 118 indicates that an exit criteria has been determined. 
Upon exit, the set 3c of gate sizes X is selected from the iteration for which the current 
worst slack was determined at step 114 to be minimal. The exit criteria can be based 

25 upon various factors, such as a total number of iterations, or that each successive iteration 
indicates that the set of weights begins to converge, as is described in greater detail 
below, indicating that path delays have been optimized or that worst slack cannot be 
further improved. A total number of iterations can be based upon a maximum number or 
some other number relating to the largest maximum number of gate sizes available for 

30 any one instance. 
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Referring to Fig. 10, there is shown a preferred embodiment of the first and 
second gate size selecting step 112. At step 120, a decision is made whether the current 
iteration is an initial iteration. If YES, the initial set x of gates sizes X for each 
5 instance insts of the gates is selected from the library, as indicated at step 122. If NO, 
the set x of new gates sizes X for each instance insts of the gates from an immediately 
prior iteration of the new gate size selecting step 1 16 is selected as the set of current first 
gate sizes, as indicated at step 124. 

10 In either event, a decision is made at step 126 whether the current iteration is an 

even number or odd number iteration. If EVEN, then at step 128 the second gate size for 
the current iteration of the process described in flowchart 110 is set to be the next 
available larger size from the library. If ODD, then at step 130 the second gate size for 
the current iteration of the process described in flowchart 1 10 is set to be the next smaller 

1 5 size from the library. 

If the second size is selected to be the next larger size at step 128, an inquiry is 
made, as indicated at step 132 whether for each i* gate instance such next larger size is 
available. If YES, then processing continues to the weight assigning step 1 14 of Fig. (9) 
20 described above. 

Similarly, if the second size is selected to be the next smaller size at step 130, an 
inquiry is made, as indicated at step 134 whether for each i* gate instance such next 
smaller size is available. If YES, then processing continues to the weight assigning step 
25 1 14 of Fig. (9) described above. 

In either event if the decision at step 132 or step 134 is NO, then, as indicated at 
step 135, for any i* gate instance for which the second size is not available in the library, 
then for any such gate instance the first gate size will be maintained throughout the 
30 performance of new gate selecting step 1 1 6. 
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Referring now to Fig. 11, there is shown a preferred embodiment of the timing 
analysis and weight assigning step 114 of Fig. 9. A static timing analysis to determine 
slack is well known and need not be further described. As indicated at step 136, the 
weight w(e) for each associated timing edge e may be determined as a function of slack 

5 on each associated timing edge e and worst slack. For example the weight w(e) for each 
associated timing edge e may be determined in accordance with the expression 

w(e)=\ /(dw + (slack(e) - WS)) (37) 
wherein slack(e) is slack on each associated timing edge e , WS is the worst slack in the 
netlist and dw is a number greater than zero such that the denominator does not go to 
zero for the case when the slack on any timing edge is the worst slack for the timing path. 

10 Accordingly it is seen that for the most critical edges, their respective weights will be the 
largest. 

As indicated at step 138, the weight w(e) on each associated timing edge e is 
normalized. Accordingly, at each one of the gates a sum of said weight w(e) on each 
15 incoming timing edge e is equal to a sum of said weight w(e) on each outgoing timing 
edge e . 

As indicated at step 140, weight w(e) for each associated timing edge e is 
updated as a function of a prior weight assigned in an immediately prior iteration at a 
20 same one of each associated timing edge e . Updating of weights allows the weights on 
each edge to converge faster resulting in fewer iterations of the method of Fig. 9. For 
example, the weights may be updated in accordance with the expression 

w(e)=(l-a)w prev (e)+aw new (e) (37) 

wherein a is a number between zero and one, w pm (e) is the prior weight, and w mw (e) is 

the current weight prior to the updating step 140. 

25 

There has been described above exemplary preferred embodiments for selecting a 
set of discrete gate size for a netlist. Those skilled in the art may now make numerous 
uses of, and departures from, the above described embodiments without departing from 
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the inventive principles disclosed herein. Accordingly, the present invention is to be 
defined solely by the lawfully permitted scope of the appended Claims. 
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